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Abstract:  Intrinsically  disordered  protein  regions  that  populate  the  so-called  “Dark  Proteome” 
offer  challenging  benchmarks  of  conformational  sampling  methods  and  their  all-atom  force 
fields  plus  solvent  descriptions  to  accurately  model  structural  transitions  on  a  multidimensional 
energy  landscape.  This  work  explores  the  application  of  parallel  tempering  methods  with 
implicit  solvent  models  as  a  computational  framework  to  capture  the  conformational  ensemble  of 
an  intrinsically  disordered  peptide  derived  from  the  Ebola  virus  protein  VP35.  A  recent  X-ray 
crystallographic  study  reported  a  protein-peptide  interface  where  the  VP35  peptide  underwent  a 
folding  transition  from  a  disordered  form  to  a  helix-(3-turn-helix  fold  upon  molecular  association 
with  the  Ebola  protein  NP.  An  assessment  is  provided  of  the  accuracy  of  two  generalized  Born 
solvent  models  (GBMV2  and  GBSW2)  using  the  CHARMM  force  field  and  applied  with 
temperature-based  replica  exchange  dynamics  to  calculate  the  disorder  propensity  of  the  peptide 
and  its  probability  density  of  states  in  a  continuum  solvent.  A  further  comparison  is  presented  of 
applying  an  explicit/implicit  solvent  hybrid  replica  exchange  simulation  of  the  peptide. 
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1.  INTRODUCTION 

The  large  conformational  heterogeneity  and  rapid  dynamic  transitions  of  intrinsically  disordered 
peptides  and  proteins  (IDPs)  present  a  challenge  to  experimental  boundaries  in  characterizing 
their  functional  form  on  rugged  energy  landscapes  (Wright  and  Dyson,  1999;  Wright  and  Dyson, 
2005).  From  a  biological  perspective,  the  broad  interest  in  IDPs  draws  principally  from  their 
fundamental  role  in  the  regulation  and  function  of  cellular  protein  networks.  Recent 
experimental  studies  have  begun  to  unravel  the  interplay  between  “ordered  chaos”  of  IDPs  and 
the  kinetic  transition  to  a  topological  funnel  of  folded  states  (Arai  et  al.,  2015).  The 
contemporary  view  of  this  dynamic  transition  is  a  process  that  occurs  by  either  an  “induced-fit” 
of  the  IDP  upon  molecular  association  with  a  protein  target  or  by  target  “fly  casting”  of  a  pre¬ 
folded  state  in  the  disordered  conformational  ensemble  of  the  IDP  (see,  e.g.,  Shoemaker  et  al., 
2000;  Arai  et  al.,  2015). 

Complementary  to  experimental  studies  are  computer  simulations  which  offer  a  powerful 
set  of  tools  to  understand  IDPs  at  the  all-atom  level  and  their  inherent  plasticity  to  navigate  a 
disordered  network  of  microstates  (see,  e.g.,  Lee  and  Chen,  2016;  Bhowmick  et  al.,  2016; 
Chebaro  et  al.,  2015;  Zhang  and  Chen,  2014).  Among  the  simulation  methods,  the  generalized 
ensemble  sampling  method  of  temperature-based  replica  exchange  (T-ReX)  (Ishikawa  et  al., 
2001;  Sugitaa  and  Okamoto,  1999),  also  known  as  parallel  tempering,  has  become  an 
increasingly  popular  approach  for  exploring  the  energy  landscape  of  proteins.  Algorithms 
combined  with  T-ReX  to  generate  protein  configurations  vary  in  their  theoretical  formulations 
and  range  from  canonical  molecular  dynamics  (MD)  simulations  to  methods  that  accelerate 
conformational  sampling.  Of  the  latter,  examples  includes  coarse  replica-exchange  molecular 
dynamics  (Peter  et  al.,  2016),  accelerated  molecular  dynamics  (see,  e.g.,  Miao  et  al.,  2015), 
Hamiltonian  switch  Metropolis  Monte  Carlo  (Mittal  et  al.,  2014),  all-atom  multicanonical 
molecular  dynamics  (Higo  et  al.,  2011)  and  self-guided  Langevin  dynamics  (SGLD)  (Wu  and 
Brooks,  2003),  among  others. 

A  computational  strategy  of  reducing  the  complexity  of  all-atom  simulations  of  proteins 
is  the  replacement  of  explicit  water  interactions  with  a  continuum  description  of  treating 
implicitly  the  bulk  physical  properties  of  solvation  effects.  The  most  common  implicit  solvent 
method  for  protein  dynamic  simulations  is  the  generalized  Born  (GB)  approximation.  GB 
models  are  computationally  faster  than  explicit  solvent  calculations  and  differ  in  their  accuracy 
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of  reproducing  Poisson-Boltzmann  solvation  energies  for  single  protein  conformations  (see,  e.g., 
Feig  et  al.,  2003).  Application  of  GB  solvent  models  to  studies  of  IDPs  has  been  reported  by 
several  laboratories  (see,  e.g.,  Ganguly  and  Chen,  2009;  Click  et  al.,  2010;  Ganguly  and  Chen, 
2015;  Chebaro  et  al.,  2015).  To  date  the  simulation  results  lack  consensus  on  the  accuracy  of  GB 
solvent  models  as  a  computational  framework  to  capture  the  fold  propensities  of  IDPs  and  their 
probability  density  of  states  on  a  conformational  landscape.  Particularly  missing  among  the 
reported  studies  are  comparative  assessments  of  GB  models  of  IDPs  with  those  modeled  by 
explicit  all-atom  solvent  replica  exchange  simulations. 

Given  the  current  interests  in  characterizing  the  so-called  “Dark  Proteome”  which 
consists  of  “invisible”  conformational  states  within  the  human,  viral  and  microbial  genomes 
(Bhowmick  et  al.,  2016;  Perdigaoa  et  al.,  2015),  this  work  presents  temperature-based  replica 
exchange  simulations  of  modeling  an  IDP  derived  from  an  Ebola  virus  protein.  Ebola  viruses  are 
nonsegmented  negative  sense  RNA  viruses  that  cause  severe  hemorrhagic  fever  (Sanchez  et  al., 
2006).  An  X-ray  crystallographic  structure  was  reported  by  Amarasinghe  and  coworkers  (Leung 
et  al.,  2015)  of  the  Ebola  nucleoprotein  NP  in  complex  with  a  28-residue  peptide  extracted  from 
Ebola  VP35  (designated  as  the  NPBP  peptide).  The  NP-VP35  viral  assembly  is  essential  for 
virus  replication  and  offers  a  protein  target  for  therapeutic  development.  Experimental  data 
reveals  the  NPBP  peptide  binds  NP  with  high  affinity  and  specificity,  and  acts  by  blocking  NP 
oligomerization.  The  peptide  undergoes  a  folding  transition  from  a  disordered  form  free  in 
solution  to  a  helix-P-turn-helix  fold  upon  molecular  association  with  NP  (Leung  et  al.,  2015). 

Two  different  generalized  ensemble  sampling  methods  are  applied  based  on  combining 
T-ReX  with  the  SGLD  simulation  method  (Lee  and  Olson,  2010)  and  two  different  GB  solvent 
models  (Lee  et  al.,  2003;  Im  et  al.,  2003)  are  examined  to  assess  their  accuracy  in  modeling  the 
density  of  states  of  the  NPBP  peptide.  One  of  the  sampling  techniques  is  the  conventional 
application  of  T-ReX  with  a  static  set  of  temperatures.  The  other  technique  is  an  adaptive  T- 
ReX  where  the  replica  clients  dynamically  walk  in  temperature  space  in  search  of  the  optimal 
population  density  on  a  modeled  energy  function  (Trebst  et  al.,  2006;  Katzgraber  et  al.,  2006; 
Lee  and  Olson,  2011;  Olson  and  Lee,  2014;  Olson  et  al.,  2016).  The  two  GB  models  differ  in 
their  dielectric-boundary  descriptions  with  one  of  them  constructed  from  an  analytic  formulation 
of  the  molecular  volume  (Lee  et  al.,  2003). 
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The  final  simulation  model  applied  to  the  NPBP  peptide  is  an  explicit/implicit  solvent 
hybrid  T-ReX/MD  method  (Chaudhury  et  ah,  2012).  The  application  of  this  simulation  model  is 
to  investigate  the  effect  of  solvent  modeling  resolution  on  the  helix  propensity  and  the  search  of 
conformational  transitions.  The  idea  behind  the  hybrid  model  is  reducing  the  number  of  replica 
clients  needed  in  explicit  solvent  simulations  by  replacing  the  contribution  of  explicit  solvent 
energies  in  the  Metropolis  exchanges  (Metropolis  et  ah,  1953)  with  those  of  an  accurate  GB 
solvent  approximation.  The  hybrid  model  allows  the  same  number  of  replica  clients  to  be 
applied  as  in  the  GB  solvent  T-ReX/SGLD  simulations  of  the  NPBP  peptide  while  retaining  a 
higher  resolution  in  conformational  sampling  on  an  explicit  solvent  landscape  (Chaudhury  et  al., 
2012;  Olson  and  Lee,  2013). 

2.  COMPUTATIONAL  METHODS 

This  section  provides  a  brief  outline  of  the  computational  methods  applied  in  this  work  of 
modeling  the  NPBP  peptide  taken  from  the  PDB  4YPI  (Figure  1).  A  general  approach  for 
conformational  sampling  is  the  application  of  T-ReX  (see,  e.g.,  Ishikawa  et  al.,  2001).  Unlike 
the  well-established  method  of  MD  simulations  at  a  single  sampling  temperature,  T-ReX  is  a 
generalized  ensemble  method  of  applying  multiple  parallel  simulations  in  which  each  replica  is 
executed  at  a  different  temperature.  In  traditional  applications  of  T-ReX,  the  temperatures  (7), 
T2,  ...,  T„),  where  n  is  the  number  of  replica  clients,  are  pre-determined  by  a  static  (fixed)  set  of 
values  that  span  a  desired  range.  It  is  common  to  model  the  set  of  temperatures  by  a 
geometrically  spaced  sequence  (Predescu  et  al.,  2004)  using  n  -  1  intervals  from  the  minimum 
temperature  denoted  as  7)  =  Tmm  to  the  maximum  Tn  =  Tmax 

>  a) 

where  7)  the  temperature  of  the  ith  replica  client  in  Figure  1. 

An  alternative  to  Eq.  (1)  is  an  adaptive  replica  exchange  method  of  allowing  the  clients  to 
dynamically  walk  in  temperature  space  (Trebst  et  al.,  2006;  Katzgraber  et  al.,  2006;  Lee  and 
Olson,  2011;  Olson  and  Lee,  2014;  Olson  et  al.,  2016).  In  implementing  the  adaptive  algorithm, 
each  client  is  tagged  as  either  “cold”  or  “hot”  depending  on  the  last  temperature  extreme  it 
visited  (Lee  and  Olson,  2011).  Tracing  of  the  clients  is  made  by  constructing  histograms  over 
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temperature  space,  «coId  ( T )  and/zhot  (  T  ) ,  where  each  bin  accumulates  the  number  of  cold  and  hot 

clients,  respectively,  visiting  each  temperature  window.  The  fraction  cold,/  of  a  client  window 
at  temperature  T  is  the  number  of  cold  clients  visiting  that  temperature  divided  by  the  total 
number  of  cold  and  hot  client  visits: 

"cold  (^) 

"cold(r)+nhot(ry  {  j 

Using  th ef(T)  tenn,  a  thennal  current  is  defined  (Lee  and  Olson,  2011) 


i-D(T)n(T)%. 


(3) 


where  D[T) is  the  diffusivity  and  // ( T )  i s  the  probability  that  any  client  will  reside  at 
temperature  T.  The  thermal  current  can  be  maximized  by  adjusting  the  temperatures  such  that 
/(/)  increases  linearly  as  a  function  of  temperature  index,  i.  Here,  a  continuous /(T)  is 
constructed  from  the  computed  values  of/ at  the  current  set  of  temperatures,  /,  and  then  search 
for  the  new  temperatures  where  f{T)=  i/(N- 1).  To  prevent  all  of  the  windows  from  clustering 

around  the  same  temperature  and  depleting  exchanges  at  the  extremes,  a  constraint  is  applied 
where  no  neighboring  temperatures  can  be  more  than  two  geometric  spacing  units  apart, 


l!+l 


< 


V  ^min  J 


2 

_am_ 


(4) 


with  the  lower  and  upper  values  of  /  set  to  Tmm  and  Tmax,  respectively. 

The  exchange  of  temperatures  between  neighboring  replica  clients,  a  and  b,  is  determined 
by  a  probability  given  by  the  Metropolis  energy  criteria  (Metropolis  et  ah,  1953) 


p[a  <->b)  =  min 


U 


,(/UA)(4-U 


(5) 


where  /?„=!/  kBTa,  kB  is  Boltzmann’s  constant ,  Ta  is  the  temperature  of  replica  client  a,  and  Ea  is 
the  potential  energy  of  client  a. 


SGLD  Simulation  Models 

For  generating  trajectories  of  the  NPBP  peptide,  two  methods  were  combined  with  T-ReX.  The 
first  is  based  on  the  SGLD  simulation  method  developed  by  Wu  and  Brooks  (2003).  The  SGLD 
equation  of  motion  is  given  by 
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p.=f.-T.p.  +  R.  +  Xg.,  (6) 


where  p  defines  the  rate  of  change  of  the  momentum  of  particle  i,  f .  is  the  force  acting  on  the 
particle,  y .  is  the  friction  constant,  R .  defines  the  random  force  and  g .  is  a  memory  function, 


which  is  scaled  by  an  ad  hoc  guiding  factor  X.  The  memory  function  gfis  defined  by  the 
moving  average  of  the  momentum  seen  by  the  system  over  an  interval  of  time,  L: 


(7) 


where  (. . .)  denotes  a  local  average.  The  time  interval  is  further  defined  as L=t  / 5 1 ,  where  t 

is  the  local  averaging  time  anddt  the  time  step  along  the  simulation  trajectory.  It  should  be  noted 
that  because  of  the  ad  hoc  force  in  Eq.  (6),  the  sampling  algorithm  deviates  from  a  canonical 
ensemble  (Lee  and  Olson,  2010;  Wu  and  Brooks,  2011;  Wu  et  ah,  2012;  Wu  et  al.,  2016).  For 
this  work,  the  deviation  is  anticipated  to  be  small  for  modeling  a  peptide  (Lee  and  Olson,  2010), 
nevertheless  the  sampling  distributions  can  be  reweighted  to  remove  the  applied  bias  (Wu  and 
Brooks,  2011). 

In  the  SGLD  simulations,  solvent  was  represented  by  implicit  solvent  models  GBMV2 
(generalized  Bom  molecular  volume  version  2)  (Lee  et  al.,  2002;  Lee  et  al.,  2003)  and  the 
GBSW2  (generalized  Born  smoothing  window  version  2)  (Im  et  al,  2003).  The  most  noted 
difference  between  the  two  models  is  representation  of  the  solvent  excluded  volume  and  the 
treatment  of  the  dielectric  interface.  The  GBMV2  parameters  were  selected  to  smooth  the 
molecular  volume  by  setting  P  =  -12  and  P3  =  0.65  (Yeh  et  al.,  2008).  The  hydrophobic 
cavitation  term  was  modeled  by  applying  a  phenomenological  surface  tension  coefficient  set  to  a 
value  of  0.015  kcal/mol/A2.  For  applying  GBSW2,  the  model  was  parameterized  to  fit  the  Lee- 
Richards  molecular-surface  Poisson  results  and  required  w  =  0.2  A,  a o  =  1.2045  and  a\  =  0.1866. 
The  hydrophobic  cavitation-energy  tension  term  was  set  to  0.030  kcal/(mol'A2). 

The  utilities  and  programming  libraries  of  the  Multiscale  Modeling  Tools  for  Structural 
Biology  (MMTSB)  (Feig  et  al.,  2004)  were  used  to  carry  out  the  T-ReX/SGLD  simulations.  The 
CHARMM  simulation  program  (version  c35b2)  was  applied  as  a  modeling  platform  (Brooks  et 
al.,  2009).  Simulations  were  carried  out  using  24  replica  clients  and  the  frequency  of  exchanges 
was  set  to  every  1  ps  of  simulation.  Temperatures  were  set  where  Tmm  =  300  K  and  Tmax  =  475 
K.  Because  the  implicit  solvent  models  GBMV2  and  GBSW2  were  originally  developed  for 


6 


UNCLASSIFIED 


TR-17-018  DISTRIBUTION  STATEMENT  A:  Approved  for  public  release;  distribution  is  unlimited. 


Generalized  Born  Solvent  Descriptions  of  the  Dark  Proteome 


and  have  been  extensively  benchmarked  with  the  CHARMM22  force  field,  this  force  field  was 
applied  with  the  CMAP  backbone  dihedral  cross-term  extension  (Mackerell  et  ah,  2004).  An 
integration  time  step  of  2  fs  was  used  and  parameters  for  SGLD  consisted  of  the  friction  constant 
set  to  y  of  1  ps'1  for  all  heavy  atoms,  the  guiding  factor  X  to  a  value  of  1,  and  the  averaging  time 
was  set  to  1  ps.  These  values  were  taken  from  our  previous  studies  of  the  SGLD  model  (Lee 

and  Olson,  2010;  Lee  and  Olson,  2011;  Olson  and  Lee,  2014).  Non-bonded  interaction  cutoff 
parameters  for  electrostatics  and  vdW  terms  were  set  at  a  radius  of  22  A  with  a  2- A  potential 
switching  function.  Covalent  bonds  between  the  heavy  atoms  and  hydrogen  atoms  were 
constrained  by  the  SHAKE  algorithm  (Ryckaert  et  ah,  1977).  The  NPBP  peptide  was  modeled 
for  200  ns  of  simulation  time  per  client,  generating  an  ensemble  of  4.8  ps. 


Hybrid  Simulation  Model 

The  second  method  applied  for  generating  trajectories  of  the  NPBP  peptide  is  an  explicit/implicit 
solvent  hybrid  T-ReX/MD  simulation  (Chaudhury  et  ah,  2012).  In  a  typical  explicit  solvent  T- 
ReX  simulation  the  energies  are  given  by 


T?  _  ryprot  ,  tt prot-solv  .  rrsolv-solv 

^explicit  all-atom  all-atom  all-atom  ’ 


(8) 


where  the  first  tenn  describes  the  protein  potential  energy  for  a  CHARMM-based  molecular 
mechanics  force  field,  the  second  tenn  is  the  explicit  protein-solvent  interactions  followed  by  the 
explicit  solvent-solvent  interactions.  The  all-atom  solvent-solvent  energy  tenn  requires 
significant  number  of  replica  exchange  clients  to  achieve  adequate  Metropolis  updates 
(Chaudhury  et  ah,  2012).  In  the  hybrid  T-ReX  method,  the  dynamics  of  each  replica  moves  on 
an  explicit  solvent  landscape.  During  a  Metropolis  update,  all  waters  are  removed  from  a  replica 
and  the  solvent  energy  term  of  the  replica  is  calculated  using  the  grid-based  GBMV2  solvent 
model 


P  TT prot  .  A  ■r'rprot-solv 

implicit  —  U  all-atom  “l“  Zi'~rGBMV2  ’ 


(9) 


where  A GS,«  is  the  free-energy  term  due  to  the  implicit  solvent  contribution. 


The  NAMD  code  (Phillips  et  ah,  2005)  was  applied  for  the  200-ns  T-ReX/MD 
simulation  with  the  CHARMM22+CMAP  force  field.  The  simulation  cubic  box  size  was  set  to 
53.19  A3  and  the  number  of  waters  was  4796.  For  modeling  the  waters  the  TIP3P  potential  was 
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applied  (Jorgensen  et  al.,  1983).  Nosc'-Hoovcr  thennostat  was  applied  with  a  temperature 
coupling  constant  of  50  kcal/s".  Since  the  additional  computational  expense  of  the  hybrid  model 
relative  to  implicit  solvent  calculations,  the  NAMD  simulation  parameters  differ  slightly  from 
the  T-ReX/SGLD  simulations  in  that  a  smaller  cutoff  distance  of  12  A  was  applied  with  a 
switching  distance  of  8  A.  The  integration  time  step  remained  identical  to  that  used  with  the 
SGLD  simulations  and  the  SHAKE  algorithm  was  similarly  applied.  Particle  mesh  Ewald  was 
applied  and  combined  with  periodic  boundary  conditions. 


Evaluation  Metrics 


To  examine  the  trajectories  generated  by  the  simulations,  the  weighted  histogram  analysis 
method  (WHAM)  (Ferrenberg  and  Swendsen,  1989;  Kumar  et  al.,  1992;  Shea  et  al.,  1998; 
Gallicchio  et  al.,  2005)  was  applied  to  the  data  sets.  The  2D  density  of  states,  Q(qi,q2),  for  a 
molecular  system,  where  qi  and  q2  are  a  set  of  reaction  coordinates  of  interest,  is  given  by 


R 

77  no) 

ni  exP  [fi  -P,£) 

j= i 

where  Uj  is  the  number  of  data  points  in  the  /th  simulation  and  [1,  and  7)  are  Boltzmann's  constant 
and  temperature  of  the  /th  simulation,  respectively.  The  function  /V,(q ,  ,q2)  is  the  histogram  of 
(qi,q2)  calculated  from  the  z'th  simulation,  and  fj  is  the  scaled  free  energy  obtained  by  solving  the 
following  equations  self-consistently, 


^  (Hi  5  H2  )  R 

I 


and 


2>,(qi,q2)exp(-p£) 

^(qpq2)  =  -fcl-8 - 

exp  (A- PA) 


(11) 


exp(-A)=  ZQ(cli’cl2)exp(-p^),  (12) 

where  Hp(qi,q2)  is  the  probability  density  at  the  inverse  temperature  p.  For  calculations 
presented  here,  qi  =  fractional  helicity  (/h)  of  the  peptide  determined  from  DSSP  (Kabsch  and 
Sanders,  1983)  and  q2  =  radius  of  gyration  (Rg). 
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The  trajectories  were  further  analyzed  by  a  Q  score  for  the  peptide.  Q  is  the  number  of 
side-chain  contacts  in  a  generated  conformation  divided  by  the  total  number  equivalent  contacts 
in  the  X-ray  crystal  structure  of  NPBP.  Values  were  computed  for  side-chain  center-of-mass 
pairs  (if),  such  that  j  >  i  and  whose  distances  are  less  than  a  cutoff  of  4.2  A.  A  sigmoidal 
function  was  applied  (implemented  in  MMTSB)  to  effectively  include  residue  pairs  that  are 
slightly  further  apart  with  a  reduced  weight.  In  addition  to  a  Q  score,  pairwise  Ca  root-mean- 
square-deviation  (RMSD)  from  the  starting  X-ray  structure  was  computed  for  each  peptide 
structure  in  a  generated  ensemble  of  conformations. 

3.  RESULTS  AND  DISCUSSION 

Figure  2  illustrates  the  X-ray  crystallographic  structure  of  the  NPBP  peptide  extracted  from  the 
Ebola  virus  VP35  in  association  with  the  Ebola  NP  protein  (Leung  et  ah,  2015).  The  binding  of 
NPBP  occupies  a  functionally  critical  site  on  NP  required  for  RNA  synthesis  and  the  peptide 
confonnation  is  stabilized  by  a  network  of  electrostatic  interactions  dominated  by  NP  residues 
Arg240,  Lys248,  and  Asp252.  Using  the  DSSP  secondary  structure  algorithm,  NPBP  (annotated 
as  residues  20-47)  shows  segments  Trp28  to  Thr35  and  Val40  to  Asp42  as  distinct  helical 
conformations.  The  overall  /h  is  0.4  and  the  bound  form  exhibits  an  Rg  of  10.5  A. 

Experimental  characterization  of  the  secondary  structure  of  the  NPBP  peptide  free  in 
solution  by  circular  dichroism  (CD)  spectroscopy  is  reported  to  show  the  peptide  as  intrinsically 
disordered  (Leung  et  ah,  2015).  When  added  to  a  solution  of  50  %  trifluoroethanol  (TFE),  the 
NPBP  peptide  transitions  from  a  coil  to  helical  structures  of  approximately  30-40  %  helicity,  thus 
suggesting  a  strong  underlying  secondary-structure  propensity.  Predictions  of  secondary- 
structure  without  bias  of  the  crystallographic  structure  estimate  the  NPBP  peptide  to  encompass  a 
consensus  fa  ~0.3  with  probabilities  greater  than  0.9  for  helical  formation  in  the  sequence 
segment  of  Gly27  to  Met34  (see,  e.g.,  Kieslich  et  ah,  2016). 

To  examine  the  accuracy  of  implicit  solvent  models  to  counterbalance  the  network  of 
electrostatic  interactions  of  the  viral  assembly  interface  that  contribute  to  the  stabilization  of  the 
NPBP  helical  fold  and  produce  a  conformational  landscape  with  a  predisposed  helix  propensity 
in  bulk  water,  replica-exchange  simulations  were  perfonned  using  different  simulation  strategies. 
The  conformational  sampling  approach  of  SGLD  was  explored  with  two  different  GB  solvent 
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models  and  two  different  temperature-based  replica-exchange  methods.  The  first  simulation 
model  result  shown  in  Figure  2b  is  the  SGLD-GBMV2  with  a  static  (fixed)  set  of  temperatures  in 
defining  the  replica-exchange  protocol.  The  2D  free-energy  profile  Q(/k,Rg)  computed  at  300  K 
using  WHAM  of  the  full  ensemble  shows  a  large  manifold  of  conformational  substates  with  a 
helix  distribution  o l'/i i  —  0  to  0.5.  Several  representative  structures  extracted  from  the  basins  are 
illustrated  in  Figure  2e.  The  confonnational  density  takes  place  in  Rg  space  of  roughly  8-11  A 
and  at  the  lower  end  of  the  population  distribution  non-structured  states  are  observed  to  occupy  a 
large  range  of  Rg  values  and  show  the  canonical  feature  of  disorder. 

Given  the  broad  population  distribution  produced  by  a  static  set  of  temperatures  in  the  T- 
ReX  simulations,  it  is  important  to  test  whether  the  simulation  model  provided  adequate 
sampling  of  the  basins.  To  address  this  issue,  an  adaptive  replica-exchange  SGLD-GBMV2 
simulation  model  was  applied  whereby  the  clients  walk  in  temperature  space  to  optimize  the 
efficiency  of  exchanges  between  nearest-neighbor  thermal  windows  at  potential  energy  barriers 
separating  conformational  states  (Olson  et  al.,  2016;  Olson  and  Lee,  2014;  Lee  and  Olson,  2011). 
The  2D  profile  from  the  adaptive  T-ReX  is  illustrated  in  Figure  2c  and  the  result  is  shown  to 
retain  the  manifold  of  transient  states  of  those  sampled  by  the  static  T-ReX  method,  yet  a 
population  shift  is  observed  toward  an  fu  ~0.5  at  the  cost  of  reducing  the  density  of  unstructured 
conformations.  The  theoretical  goal  of  the  adaptive  method  is  to  enhance  sampling  of 
conformational  transitions  for  a  modeled  potential  energy  surface.  Early  success  of  the  method 
applied  to  a  sharp  phase  transition  of  unfolding-folding  of  the  protein  SH3  showed  better 
agreement  with  the  experimental  melting  temperature  than  the  traditional  static  approach 
calculated  over  an  identical  length  of  simulation  time  (Lee  and  Olson,  2011).  The  adaptive 
method  also  captured  with  greater  accuracy  the  native  state  of  SH3  extracted  from  the 
conformational  ensemble.  Given  these  prior  outcomes,  and  while  the  NPBP  certainly  lacks  the 
folding  cooperativity  of  SH3,  the  result  suggests  for  CHARMM22+CMAP/GBMV2  a  “native” 
state  of  helix  propensity  near  the  value  observed  experimentally  for  the  crystallographic  bound 
conformation  and  one  which  is  inconsistent  with  the  CD  analysis  in  free  solution.  Because  the 
potential  energy  surface  is  the  same  between  the  static  and  adaptive  T-ReX  methods,  the  less- 
efficient  sampling  approach  will  eventually  converge  to  find  a  comparable  Q(/k,Rg). 

To  determine  the  bias  of  the  GBMV2  solvent  approximation  on  Q(/h,Rg),  adaptive  T-ReX 
simulations  were  performed  with  a  different  implicit  solvent  model  based  on  the  GBSW2 
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approximation.  Of  the  GB-based  solvent  models  developed  for  protein  dynamics,  GBMV2  is 
one  of  the  most  accurate  models  in  reproducing  Poisson-Boltzmann  theory  with  a  Lee-Richards 
molecular  surface  (Feig  et  ah,  2003).  The  basis  of  GBMV2  is  an  analytic  formulation  of  the 
molecular  volume  (Lee  et  ah,  2003),  whereas  the  less  accurate  but  computationally  much  faster 
GBSW2  model  is  based  on  a  smooth  dielectric-boundary  formulation  constructed  by  applying  a 
superposition  of  atomic-centered  polynomials  (Im  et  ah,  2003).  The  dissimilarities  between  the 
two  models  are  clearly  illustrated  in  Figure  2d.  Application  of  GBSW2  significantly  reduces  the 
number  of  high-probability  conformational  excursions  and  leads  to  a  folding  funnel  at  fn  ~0.5. 
While  the  “optimized”  fu  from  the  two  different  implicit  solvent  models  is  surprisingly  similar, 
the  limited  disorder  from  the  GBSW2  model  in  its  current  parameterized  implementation  makes 
this  solvent  approximation  less  suitable  for  modeling  IDPs  (for  an  alternative  parameterization  of 
GBSW,  see,  e.g.,  Chen  2010). 

Figure  3  shows  the  probabilities  of  observing  Rg  as  a  function  of  three  ensemble  sampling 
temperatures.  The  GBMV2  model  produced  more  compact  states  of  NPBP  than  the 
crystallographic  bound  form,  while  GBSW2  yielded  Rg  values  less  collapsed.  This  observation 
can  be  partially  attributed  to  the  distinction  in  molecular  surface  representations  between  the 
solvent  models,  where  different  weights  are  applied  to  the  surface-tension  term  that  describes  the 
hydrophobic  free  energy.  In  general,  MD  simulations  of  unfolded  states  are  more  compact  and 
tend  to  favor  helical  structures  than  those  found  experimentally  (Piana  et  ah,  2014).  By 
example,  an  experimental  Rg  for  a  unfolded  28  amino  acids  is  estimated  to  be  13  A  (Kohn  et  ah, 
2004). 

Also  shown  in  Figure  3  are  the  probability  profiles  of  Ca-RMSD  and  the  fraction  of  side- 
chain  contacts  similar  to  the  starting  conformation  of  NPBP.  The  ensemble  average  over 
contacts  is  denoted  as  <Q>  and  values  less  than  0.6  are  considered  unrelated  to  the  starting 
structure.  When  combined  with  the  analysis  of  the  2D  profiles,  the  probabilities  provide  an 
interesting  picture  of  the  rare  event  of  recognizing  a  peptide  conformation  in  the  ensemble  that  is 
similar  to  the  NPBP  bound  form.  For  the  GBMV2  model  and  considering  only  the  last  50  ns  of 
simulation  time,  the  lowest  RMSD  is  2.9  A  with  Q  =  0.6,  and  is  clustered  in  the  outer  periphery 
of  the  highly-populated  basin  labeled  as  III  in  Figure  2c.  This  sparse  cluster  of  low  RMSD  states 
emerges  with  an  fu  of  0.5  and  Rg  approaching  10  A. 
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It  is  also  important  to  understand  the  configurational  stability  of  IDPs  from  simulations 
and  their  helix  propensities.  The  thennal  unfolding  profiles  for  NPBP  are  shown  in  Figure  4a. 
Consistent  with  the  reduced  number  of  transient  states  and  their  populations  among  the  GB 
models,  GBSW2  retains  helicity  over  a  greater  thermal  range.  The  congregation  of  replica 
clients  in  the  range  of  360  K  to  425  K  for  the  adaptive  method  (GBMV2  and  GBSW2)  is  the 
effect  of  enhanced  sampling  of  transition  points  that  stabilize  helix  formation.  The  statistical 
errors  in  the  histograms  for  all  model  simulations  are  approximately  fu  ±  0.1  along  the 
temperature  contour.  Simulation  convergence  and  the  dominance  of  helix  fonnation  in  NPBP 
can  be  further  tested  by  conducting  T-ReX  simulations  starting  from  a  random  coil  state  rather 
than  the  bound  helix-P-turn-helix  conformation.  Although  these  latter  simulations  were  executed 
only  to  100  ns  using  the  adaptive  method.  Figure  4b  shows  convergence  to  a  folded  state  of 
helical  confonnations  and  establishes  the  strong  helix  propensity  of  applying  the  GB  solvent 
descriptions. 

The  overweighting  of  secondary  structure  biases  from  the  GBMV2  and  GBSW2  solvent 
models  is  comparable  to  other  studies  of  using  different  GB  solvent  models  and 
parameterizations  (Chebaro  et  ah,  2015;  Ganguly  and  Chen,  2009;  Click  et  al.,  2010).  As  a 
further  test  of  the  impact  of  the  GBMV2  solvent  model  and  its  mean-field  resolution  of  smearing 
out  the  details  of  the  solvent  on  sampling  confonnational  transitions  of  NPBP,  the  final 
simulation  model  tested  is  an  explicit/implicit  solvent  hybrid  T-ReX/MD  method.  This  model 
generates  peptide  configurations  on  an  explicit  solvent  (TIP3P)  landscape  while  using  the  same 
number  of  replica  clients  as  in  the  implicit  solvent  calculations.  The  latter  is  achieved  by  using 
GBMV2  in  the  Metropolis  exchanges  rather  than  explicit  solvent.  It  is  worth  noting  that,  while  it 
is  not  the  goal  to  determine  unconstrained  folding  free  energies  to  high  accuracy,  replacement  of 
energies  from  an  all-atom  representation  to  a  mean-field  approximation  can  produce  errors  in  the 
detailed  balance  required  of  a  canonical  ensemble  (Chaudhury  et  al.,  2012). 

Figure  5  shows  Q(/h,/?g)  at  300  K  from  the  WHAM  calculation  of  the  hybrid  simulation 
model  ensemble  and  the  thermal  unfolding  profile.  Several  important  observations  can  be  made 
in  comparison  to  the  static  GBMV2  model  which  best  corresponds  to  the  non-adaptive  hybrid 
model.  The  most  important  distinction  between  the  results  is  the  striking  difference  in  the 
favorable  free  energies  and  their  network  that  shuttle  conformations  among  the  helical  basins. 
While  the  hybrid  and  GBMV2  model  show  sufficient  plasticity  among  the  states,  the  hybrid 
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model  shows  a  free-energy  minimum  at  a  slightly  lower  fa  =  0.26  vs.  0.37,  and  yields  good 
agreement  with  secondary-structure  predictions.  The  distinction  in  the  potentials  of  mean  force 
among  the  models  can  be  illustrated  by  considering  a  transition  between  an  unstructured  state 
and  the  free-energy  minimum.  For  the  static  GBMV2,  the  transition  (fa  =  0;  Rg  =  1 1  A)  — »  (fa  = 
0.37;  Rg  =  8  A)  yields  AAG  =  0.1  kcal/mol,  whereas  for  the  adaptive  model  the  transition  from 
the  same  disordered  state  —>(fa  =  0.47;  Rg  =  9  A)  AAG  =  1.0  kcal/mol,  and  for  the  hybrid  model 
the  transition  — »  (fa  =  0.26;  Rg  =  9  A)  AAG  =1.7  kcal/mol.  While  the  static  model  exhibits  a 
reversible  transition  to  unstructured  states  and  would  appear  to  be  in  better  agreement  with  the 
CD  experiments  (Leung  et  ah,  2015),  enhanced  sampling  of  Q (fa,Rg)  by  the  adaptive  method  for 
this  solvent  description  revealed  a  more  costly  transition  to  the  densely  populated^  ~0.5. 

The  lowest  RMSD  conformer  for  the  hybrid  model  via  the  last  50  ns  is  3.3  A  with  Q  = 
0.6  and  Rg  =  9.4  A.  This  confonner  is  illustrated  in  Figure  5b  as  the  first  structure  depicted  for 
the  basin  labeled  III.  The  conformation  is  fonned  from  a  helical  hairpin  of  residues  Ser26-Met34 
and  Val40-Phe44.  The  top-rank  conformer  based  on  potential  energies  for  the  free-energy 
minimum  at  fa  =  0.26  is  illustrated  as  the  first  structure  for  basin  I.  This  structure  shows  a  5- 
residue  helix  of  Trp28-Met34.  Among  the  highly  populated  sampled  basins,  a  distinction 
between  the  simulation  models  is  the  cluster  at  fa  =  -0.6,  where  the  hybrid  model  shows  an 
enhanced  free  energy  of  population.  Unlike  the  other  basins,  this  basin  lacks  a  direct  low-energy 
pathway  along  the  manifold  of  states. 

A  statistical  average  of  the  ensemble  for  the  hybrid  model  computed  from  the  multiple 
temperatures  of  the  T-ReX  simulation  is  illustrated  in  Figure  5c  along  with  a  comparison  with 
the  static  GBMV2  model.  Despite  the  differences  in  the  Q.(fa,Rg)  profiles  between  the  models,  a 
simple  statistical  average  without  reweighting  based  on  free  energies  shows  remarkably  similar 
fa  values  at  300  K.  Because  of  the  lack  of  instantaneous  relaxation  of  the  explicit  waters  in 
contrast  to  GB  approximations,  the  hybrid  model  shows  more  confined  excursions  of  unfolded 
states  at  the  upper  Rg  boundaries.  Like  many  MD  simulations  of  unfolded  states  with  explicit 
solvent  (Piana  et  ah,  2014),  a  residual  secondary-structure  propensity  is  observed  at  475  K. 

The  more  compact  favorable  states  observed  in  the  explicit/implicit  solvent  hybrid  model 
than  that  corresponding  to  the  bound  NPBP  conformation  is  unlikely  due  to  the  GB  model,  but 
rather  the  additive  force  field  (Piana  et  ah,  2014).  As  noted  above,  the  CHARMM22+CMAP 
force  field  was  selected  because  of  extensive  benchmarks  in  reported  studies  of  the  GBMV2  and 
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GBSW2  solvent  descriptions  to  successfully  model  natively  folded  structures  of  proteins  (see, 
e.g.,  Yeh  et  al.,  2008).  While  there  are  no  reported  studies  of  applying  either  GBMV2  or 
GBSW2  with  the  more  refined  CHARMM36m  force  field  and  its  parameterization  for  TIP4P- 
based  explicit  solvent  simulations  (Huang  et  al.,  2016),  switching  to  this  description  may  help 
reconcile  the  underestimated  Rg  values  with  those  experimentally  determined  for  unfolded  states 
and  reduce  the  overall  weight  and  stabilization  of  helix  propensities. 

4.  CONCLUSIONS 

The  current  initiative  to  develop  an  atomistic  understanding  of  “invisible”  confonnational  states 
of  the  human/viral/bacterial  proteomes  requires  an  accurate  computational  framework  for 
modeling  conformational  transitions  within  a  disordered  ensemble  and  their  population  density. 
The  work  presented  here  examined  the  application  of  temperature-based  replica  exchange 
simulations  with  different  sampling  methods  and  solvent  descriptions  of  modeling  an 
intrinsically  disorder  28-residue  peptide  from  the  Ebola  virus  protein  VP35.  The  X-ray 
crystallographic  conformation  of  the  VP35  peptide  bound  to  Ebola  NP  reports  a  helix-P-turn- 
helix  fold  of  roughly  40  %  helical  structure,  whereas  in  free  solution  the  peptide  is  unstructured. 
The  simulations  of  the  unbound  peptide  showed  the  selection  of  a  GB  solvent  model  combined 
with  a  replica-exchange  sampling  protocol  can  have  a  significant  effect  on  the  sampling 
populations.  Overall,  the  tested  GB  models  tend  to  favor  a  free-energy  minimum  of  roughly  50 
%  helical  content  for  the  peptide.  The  effect  of  an  adaptive  temperature -based  replica  exchange 
protocol  compared  to  a  traditional  approach  of  a  static  set  of  temperatures  was  found  to  reduce 
the  amount  of  population  disorder  and  shifted  the  ensemble  to  helical  conformations  with  an 
extended  peptide  folding  stabilization.  A  comparison  with  an  explicit/implicit  solvent  hybrid 
MD-based  replica  exchange  simulation  showed  that  conformational  sampling  on  an  explicit 
solvent  landscape  leads  to  a  free-energy  minimum  of  approximately  20  %  helicity,  yet  the  overall 
conformational  network  underlying  transient  states  resembles  more  of  a  helix-fold  propensity  in 
a  solvent  mixture  of  TFE-water  rather  than  bulk  water.  The  simulation  results  can  be 
summarized  as  a  benchmark  for  the  testing  of  more  refined  CHARMM-based  force  fields  and 
different  GB  model  parameterizations.  The  ultimate  goal  is  to  capture  greater  heterogeneity  in 
conformational  probabilities  and  reduce  the  overstabilization  of  helix  propensities  in  modeling 
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intrinsically  disordered  peptides. 
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Figure  1  |  Computational  strategies  of  modeling  the  Ebola  virus  VP35  peptide  (PDB:  4YPI)  in 
its  unbound  form  using  temperature-based  replica  exchange  (T-ReX)  simulation  methods.  The 
methods  include:  (1)  GBMV2  solvent  model  applied  with  a  traditional  (static)  set  of 
temperatures  spanning  a  range  from  a  minimum  temperature  (7Y)  to  the  upper  extreme  ( Tn ), 
where  n  is  the  number  thermal  windows  (ensemble  computing  clients);  (2)  GBMV2  using  an 
adaptive  (dynamically  walking)  set  of  temperatures  between  T\  and  Tn;  (3)  GBSW2  solvent 
model  applied  by  with  adaptive  sampling;  and  (4)  TIP3P/GBMV2  hybrid  replica  exchange 
method.  Energies  (£))  used  in  the  replica  exchanges  by  the  different  strategies  are  described  in 
the  text.  Molecular  figures  were  drawn  with  PyMOL  (www.pymol.org). 
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Figure  2  |  Simulation  results  of  sampling  the  Ebola  virus  VP35  NPBP  peptide  using  GB-based 
solvent  models  and  replica  exchange  methods,  (a)  X-ray  crystallographic  structure  of  the  NPBP 
peptide  bound  to  Ebola  NP  (depicted  as  a  molecular  surface),  (b)  Probability  density  profde 
fi( fn,Rg )  computed  from  the  static  T-ReX  simulation  method  with  order  parameters  of  fractional 
helicity  and  radius  of  gyration,  (c)  Probability  density  profile  results  from  the  adaptive  T-ReX 
method  with  the  GBMV2  solvent  model,  (d)  Adaptive  T-ReX  with  GBSW2  solvent  model,  (e) 
Representative  conformations  extracted  from  the  simulations  at  indicated  basins. 
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Figure  3  |  Calculated  probability  profiles  for  sampling  values  of  radius  of  gyration  and  Ca- 
RMSD  from  the  starting  bound  conformation  of  the  NPBP  peptide.  Plot  lines  colored  blue 
represent  quantities  extracted  at  300  K  from  the  generated  conformational  ensembles,  red 
represent  values  at  390  K  and  green  at  475  K.  From  the  top  figure  to  bottom,  simulation  results 
are  static  T-ReX/GBMV2,  adaptive  T-ReX/GBMV2  and  adaptive  T-ReX/GBSW2. 
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Figure  4  |  Thermal  unfolding  profiles  computed  from  the  simulations  of  the  Ebola  NPBP 
peptide,  (a)  Profiles  calculated  from  the  starting  folded  peptide  conformation  using  the  three 
simulation  models  of  the  static  T-ReX/GBMV2  (blue  colored  line),  adaptive  T-ReX/GBMV2 
(red  colored  line)  and  adaptive  T-ReX/GBSW2  (green  colored  line),  (b)  Profiles  calculated  from 
the  adaptive  T-ReX  simulations  of  starting  from  an  unstructured  (coil)  peptide  fold. 
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Figure  5  |  Simulation  results  of  sampling  the  Ebola  virus  VP35  NPBP  peptide  using  the 
explicit/implicit  solvent  hybrid  T-ReX/MD  method,  (a)  Probability  density  profile  Q (fa,Rg) 
computed  from  sampling  fractional  helicity  and  radius  of  gyration.  (b)  Representative 
conformations  extracted  from  the  simulations  are  illustrated  for  selected  basins,  (c)  Thermal 
unfolding  profiles  of  the  peptide  computed  using  the  explicit/implicit  solvent  hybrid  T-ReX/MD 
method  (light  colored  symbols)  compared  to  the  static  T-ReX/SGLD  method  using  GBMV2 
(blue  colored  symbols).  A  representative  structure  is  shown  from  the  explicit  solvent 
calculation. 
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